Generating Complex Morphology for Machine Translation
نویسندگان
چکیده
We present a novel method for predicting inflected word forms for generating morphologically rich languages in machine translation. We utilize a rich set of syntactic and morphological knowledge sources from both source and target sentences in a probabilistic model, and evaluate their contribution in generating Russian and Arabic sentences. Our results show that the proposed model substantially outperforms the commonly used baseline of a trigram target language model; in particular, the use of morphological and syntactic features leads to large gains in prediction accuracy. We also show that the proposed method is effective with a relatively small amount of data.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملGenerating the Translation Equivalent of Agentive Nouns Using Two-Level Morphology
This paper is about generation of translation equivalent of agentive nouns with the use of automatically learned two-level phonological rules. The system is implemented using the PC-KIMMO environment. The basis for the research presented in this paper are two lexicons that contain a list of agentive nouns in Macedonian and English including their components (noun, verb, adjective, pronoun) and ...
متن کاملExploring Spanish-morphology effects on Chinese–Spanish SMT
This paper presents some statistical machine translation results among English, Spanish and Chinese, and focuses on exploring Spanish-morphology effects on the Chinese to Spanish translation task. Although not strictly comparable, it is observed that by reducing Spanish morphology the accuracy achieved in the Chinese to Spanish translation task becomes comparable to the one achieved in the Chin...
متن کاملA Discriminative Lexicon Model for Complex Morphology
This paper describes successful applications of discriminative lexicon models to the statistical machine translation (SMT) systems into morphologically complex languages. We extend the previous work on discriminatively trained lexicon models to include more contextual information in making lexical selection decisions by building a single global log-linear model of translation selection. In offl...
متن کاملAbu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences
This paper presents the systems submitted by the Abu-MaTran project to the Englishto-Finnish language pair at the WMT 2016 news translation task. We applied morphological segmentation and deep learning in order to address (i) the data scarcity problem caused by the lack of in-domain parallel data in the constrained task and (ii) the complex morphology of Finnish. We submitted a neural machine t...
متن کامل